Prediction of lysine propionylation sites using biased SVM and incorporating four different sequence features into Chou's PseAAC

J Mol Graph Model. 2017 Sep:76:356-363. doi: 10.1016/j.jmgm.2017.07.022. Epub 2017 Jul 25.

Abstract

Lysine propionylation is an important and common protein acylation modification in both prokaryotes and eukaryotes. To better understand the molecular mechanism of propionylation, it is important to identify propionylated substrates and their corresponding propionylation sites accurately. In this study, a novel bioinformatics tool named PropPred is developed to predict propionylation sites by using multiple feature extraction and biased support vector machine. On the one hand, various features are incorporated, including amino acid composition, amino acid factors, binary encoding, and the composition of k-spaced amino acid pairs. And the F-score feature method and the incremental feature selection algorithm are adopted to remove the redundant features. On the other hand, the biased support vector machine algorithm is used to handle the imbalanced problem in propionylation sites training dataset. As illustrated by 10-fold cross-validation, the performance of PropPred achieves a satisfactory performance with a Sensitivity of 70.03%, a Specificity of 75.61%, an accuracy of 75.02% and a Matthew's correlation coefficient of 0.3085. Feature analysis shows that some amino acid factors play the most important roles in the prediction of propionylation sites. These analysis and prediction results might provide some clues for understanding the molecular mechanisms of propionylation. A user-friendly web-server for PropPred is established at 123.206.31.171/PropPred/.

Keywords: Biased support vector machine; Feature extraction; Incremental feature selection; Post-translational modification; Propionylation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Acetylation
  • Algorithms
  • Amino Acid Sequence
  • Amino Acids / chemistry
  • Computational Biology / methods*
  • Lysine / chemistry*
  • Peptides / chemistry
  • Position-Specific Scoring Matrices
  • Protein Processing, Post-Translational
  • Reproducibility of Results
  • Support Vector Machine*

Substances

  • Amino Acids
  • Peptides
  • Lysine